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On k - ary n-cubes: Theory and Applications 

Weizhen Mao 2 and David M. Nicol 3 


Department of Computer Science 
The College of William and Mary 
Williamsburg, VA 23187-8795 
{wm, nicol} @cs. wm.edu 


Abstract 

Many parallel processing networks can be viewed as graphs called k- ary n-cubes, 
whose special cases include rings, hypercubes and toruses. In this paper, combinatorial 
properties of k- ary n-cubes are explored. In particular, the problem of characterizing the 
subgraph of a given number of nodes with the maximum edge count is studied. These 
theoretical results are then used to compute a lower bounding function in branch-and- 
bound partitioning algorithms and to establish the optimality of some irregular partitions. 


J An extended abstract of this paper (without any proofs and missing some theorems) has been submit- 
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1 Introduction 



In a k-ary n-cube G k>n , each node is identified by an n- bit bas e-k address .b 0 , 

and for every dimension i — 0, 1, . . . , n — 1, it is connected by edges to nodes with addresses 
bn-\ . . -bi ± l(mod k) . . . 6 0 . 

We can also define Gk, n recursively. First, we define a ring of k nodes 0, 1, . . . , k - 1 to 
be a graph with edges between i and i + l(mod k) for i = 0, 1, . . . , k - 1 . When k = 1 , a 
ring is a point. When k — 2, a ring is two nodes sharing an edge. When k > 3, a ring is a 
conventional ring. The recursive definition of Gk, n is as follows. 

• Gk,i is a ring of k nodes. Without loss of generality, we place the k nodes on a fine, 
and call the leftmost node the 0 th position node and the rightmost node the (k - l) s * 
position node. 


• Gk, n contains k composite subcubes of type Gk, n ~ i placed from left to right. For each po- 
sition i = 0, . . . , k n 1 — 1, edges between composite subcubes are defined by connecting 
all k i th position nodes in a ring. 


Further, Gk, n can also be viewed as an n-dimensional (rc-D) torus, which is a k x • • • x k 
cube of grids with wrap-around edges. 

The second and the third definitions of G k>n provide two ways of drawing G k?n . See Figure 
1 for an example. 


Table 1 shows special cases of Gk , n • The first column contains the values of k, and the 
first row contains the values of n. We notice that the class Gk,n contains many topologies 
important to parallel processing, such as rings, hypercubes and toruses; hence a thorough 
study of Gk,n is worthwhile. 

The following combinatorial properties of G k , n are easy to verify except perhaps the last 
one, for which we provide its proof in Appendix. 


Property 1.1 G k , n has k n nodes . 


Property 1.2 G k>n contains k composite subcubes of type G k , n —\, and the number of edges 
with endpoints in different composite subcubes is k n ~ l for k = 2 and k n for k > 3. 

Property 1.3 G k , n is a regular graph, meaning that each node has the same degree. The 
degree of each node is n for k — 2 and 2 n for k > 3. 


Property 1.4 The number of edges in G k , n is nk n ~ l for k = 2 and nk n for k> 3. 

Property 1.5 In each i th composite subcube (0 < i < k - 1) of type G k , n ~i in G k , n , choose 
m,i nodes, and define m = YaZ o m i- The number of edges with endpoints among these m 
nodes but in different composite subcubes is no larger than minfmo,?^!} for k = 2, and is no 
larger than m - maxo<Kfc-i{?u,} + min 0 <i< k -i{nii} for k > 3. 
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Figure 1: A 3-ary 2-cube G 3)2 
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Table 1: Special cases of Gk, n 

Properties of k~ ary 77,-cubes related to VLSI concerns have been explored by Dally [4]. One 
property that is related to our study is the bisection-width of fc-ary 77-cubes, the minimum 
number of edges one must cut when partitioning the graph into two subgraphs with the equal 
numbers of nodes. Our work considers a generalization of this notion: given that the partition 
may contain P subgraphs, what is the minimum number of edges between the subgraphs? 

The problem of partitioning graphs for parallel processing includes rigorous treatments 
in [8, 11], where algorithms are developed that partition graphs with guarantees on the load- 
imbalance and number of edges cut. Our work is similar in the sense of its rigor, restricted 
to &-ary 77 -cubes we give achievable lower bounds on partitioning costs. 

We have previously studied properties of fc-ary 77-cubes in the context of load balancing 
[10]. Here graph nodes typically represent computation and edges represent communication. 
For any subgraph, define an internal edge to be one with two endpoints in the subgraph 
and an external edge to be one with one endpoint in the subgraph; viewing the subgraph 
as the set of nodes assigned to a processor, the number of external edges is a. measure 
of the communication cost. Allowing nodes and edges to be weighted (reflecting relative 
computation and communication volumes, respectively), the “load” of a subgraph is taken 
to be the sum of the weights of its nodes and its external edges. If Gk, n is partitioned into 
P subgraphs, the bottleneck cost of the partition is the maximum load among all partition 
subgraphs [2, 12]. The bottleneck cost reflects that of one phase of a data parallel computation 
where computation and communication are not overlapped, and a global synchronization 
occurs at the end of the phase. The communication that occurs is needed for the subsequent 
phase, there are no data dependencies among the computations performed in a given phase. 
In a previous paper [10] we showed that certain equi-partitions are optimal in the sense of 
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minimizing the bottleneck cost, but that, surprisingly, there exist cases where the optimal 
partition is not an equi-partition. These results are based on a lower bound on a processor’s 
communication cost, a bound that is achieved for selected subgraph sizes. The current paper 
completes that work by identifying an achievable bound for general subgraph sizes. 

The problem of identifying the minimal communication cost (assuming unit edge weight ) 
of a subgraph of size m is the same as maximizing the number of internal edges in a subgraph 
with m nodes, since each node in G k , n has the same degree. That is, we study the following 
combinatorial problem. 

Consider any subgraph S m of m < k n nodes in G k , n . Let e(S m ) be the number 
of internal edges in S m . Define 

e k (m,n) = rnax{e(5' m )}. 

For any m = 1,2,. . . ,& n , determine e k (m, n), the maximum number of internal 
edges in any subgraph S m in a Ar-ary n-cube. 

We will say that a subgraph of G k , n with m nodes is optimal if it has e k (m , n ) internal edges. 

The case k = 1 is trivial: ei (m,n) = 0 for m < l n = 1. In Sections 2, 3, and 4, we will 
determine e 2 (m, 7i), e 3 (m,n), and e 4 (m,n), respectively. In Section 5, we will study the case 
k > 5. In Section 6, we present two applications of the results developed; one uses these 
results in the context of branch-and-bound algorithms for partitioning k ~ ary 7 i-cubes with 
generally weighted nodes and edges. Finally, we summarize our contributions in Section 7. 
Appendix contains the proofs of Property 1.5 and lemmas contributing to the main results. 

2 The case k = 2 

To determine e 2 (m,rc), the maximum number of internal edges of a subgraph with m nodes 
in a hypercube, we will have to do some preliminary work. 

Definition 2.1 

w(i) denotes the sum of all bits in the base-2 (binary) representation of i. 

LF(«,i), i < j, denotes the sum of w(i ), . . 

The following three lemmas concern properties of function W. Their proofs can be found 
in Appendix. 

Lemma 2.1 W(i, 2 i - 1) = W( 0 , i - 1) + i for i > 1. 

Lemma 2.2 W(i -j- 1 , 27 ) = kP( 0 ,i — 1) 4- i for i > 1. 

Lemma 2.3 W(j,j + i - 1) > W(0,i - 1) + i for j>i> 1. 

We next define a recursive function F and give its closed form in terms of W. 
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Definition 2.2 

F(0) = F(1) = 0; 

F(m) = F( ff 1 ) + F( If J ) + Lf J for m > 2. 

Theorem 2.1 F(m) = W( 0, m - 1) for m > 1. 

Proo/ We induct on m. When m = 1, P(l) = W(0,0) = 0. Assume that the equation holds 
for < rn — 1. Now consider rn. 

Case 1. m — 2? for some z > 1. 

P(m) = F(i) + F(i) + i (Definition 2.2) 

= W (0c — 1) + W(0, z — 1) -f i (Inductive hypothesis) 

— W(0, i — 1) + TT(z, 2z — 1) (Lemma 2.1) 

- W(0, 2z - 1) 

= LU(0, m — 1). 

Case 2. m = 2z + 1 for some i > 1. 

F(m) = P(z + 1)4- P(z) + z (Definition 2.2) 

= LU(0, z) + W(0, z — 1) + z (Inductive hypothesis) 

= W(0,i) + W(i + l,2z) (Lemma 2.2) 

= W(0,2z‘) 

— W(0, m — 1). I 

Corollary 2.1 F{m) > P(m 0 ) + P(mi) + min{m 0 , mi} for m 0 + m x = m. 

Proo/ If at least one of ra 0 and mi is 0, the inequality holds trivially. Now assume that 

m 0 > mi > 1. 

P(m) = W(0, m — 1) (Theorem 2.1) 

= W(0, m 0 — 1) + W(m 0 , m — 1) 

> ^(0, mo — 1) + W(0, mi ~ 1) + mi (Lemma 2.3) 

= F(mo) + F(m 1 ) + mi (Theorem 2.1). H 

Corollary 2.2 F(m) = |?nlog 2 m if m = 2 l for some l. 

Proof Use Definition 2.2 and inductive proof on m. H 

It turns out that F(m) exactly captures the quantity of interest. 

Theorem 2.2 e 2 (m, n) - F(m) for m <2 n . 
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m=4 m=5 m=6 


Figure 2: Subgraphs of G 2yU achieving internal edge count F(m) 

Proof Since G 2 , n contains two composite subcubes of type <? 2l n- 1 , assume that m 0 and m 1 
nodes are chosen in the 0 th and I s * composite subcubes, respectively. By Property 1.5, 

e 2 (0,n) = e 2 (l, n) = 0; 

e 2 (m, n) < max {e 2 (rao, n — 1) + e 2 (ra;i, n — 1) + min{m 0 , mi}}. 

V y y mj—m 

First we prove by induction on m that e 2 (m,n) < F(m ). When m = 0, 1, e 2 (m,n) = 
F(m) — 0. Assume that the inequality holds for < m - 1. Now consider m. 

e 2 (m, n) < max {e 2 (mo, n — 1) + e 2 (mi, n — 1) + min{mo, 

< max {i r ’(m 0 ) + ^(mi) + min{mo, nii}} (Inductive hypothesis) 

V i ~m ' 

< F(m ) (Corollary 2.1). 

Next we prove that there is a subgraph of m nodes such that the number of internal 
edges in is F(m). Here is how we can allocate the m nodes for S Allocate f f] nodes 
into the 0 </l composite subcube and [f J nodes into the 1 st composite subcube; use the same 
method recursively to allocate the nodes in each composite subcube. It is obvious that the 
number of internal edges in S ^ is exactly F(m). B 

This theorem tells us about the structure of a subgraph with exactly F(m) internal 
edges it is possible to bisect this subgraph a evenly” with exactly edges between the 
two pieces, which are themselves optimal with respect to their sizes. Figure 2 illustrates 
optimal subgraphs of G 2>n for m = 3, 4, 5, 6. 


3 The case k ~ 3 

Similar to the previous section, to determine c 3 (m,n) for (? 3jTI we will have to do some 
preliminary work. 

Definition 3.1 

z(i) denotes the sum of all bits in the base-3 representation of i. 

Z(i,j), i < j, denotes the sum of z(i ), ..., z(j). 
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The following five lemmas concern the properties of function Z . Their proofs can be 
found in Appendix. 

Lemma 3.1 Z(0,3t - 1) = 3Z(0,t - 1) + 3i for i > 1. 

Lemma 3.2 Z{0,3i) = Z{0,i) + 2Z{0,i - 1) + 3i for i > 1. 

Lemma 3.3 Z( 0, 3t + 1) = 2Z(0, i) + Z(0, i - 1) + 3* + 1 /or * > 1. 

Lemma 3.4 Z(/, j + i- 1) > Z(0, i - 1) + * /or / > * > 1. 

Lemma 3.5 Z(/,/ + i x + i 2 — 1) > Z(0, i\ — 1) + Z( 0, « 2 — 1) + i\ + 2 « 2 for j > ij > * 2 > 1. 

We next define a recursive function G and give its closed form in terms of Z. 

Definition 3.2 
G'(O) = (7(1) = 0; 

G , (m) = (mmod3)G'(rfl) + (3-mmod3)G(LfJ) + m- [f] + [fj form >2. 

Theorem 3.1 G(m) = Z(0, m - 1) for m > 1. 

Proof Similar to the proof of Theorem 2.1. In the inductive step, we consider three cases: 

m ~ m — 3« + 1, and m = 3i + 2, and use Lemmas 3.1, 3.2, and 3.3 in the three cases, 
respectively. B 

Corollary 3.1 G(m) > 6 f (mo)+G(mi)+G(m2)+m-max{mo,mi,m 2 }+min{mo,m 1 ,r7?.2} 
for m 0 + mi + m 2 = m. 

Proof If at least two of ?72o, 7771 and tt7 2 are 0, the inequality holds trivially. If only one, say 
777 - 2 , is 0, assuming that ttt-o > mi > 1, the derivation is almost identical to the same case in 
the proof of Corollary 2.1 except here we use G instead of F and Z instead of W. If none of 
m 0 , 777. ! and m 2 is 0, assuming that 777 0 > mj > 7772 > 1, we have 

G{m) = Z (0 , 777 l) (Theorem 3.1) 

= Z(0, 777o — 1) + Z(t77 0 , 777 — 1) 

> Z(0, 777o — 1) + Z(0, 777.1 1) T Z(0, 777-2 1) T 777i T 2 tt 7 2 (Lemma 3.5) 

~ ^( 7770 ) + G(m,i) + C(t77 2 ) + 777 — 777q + t? 7 2 (Theorem 3.1). B 

Corollary 3.2 G(m) — 777 log; 3 m if m = 3 l for some l. 

Proof Use Definition 3.2 and inductive proof on m. B 

It turns out that G(m) exactly captures the quantity of interest. 

Theorem 3.2 e 3 (m,n) = G(m) for m < 3 \ 
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Proof Since <7 3>n contains three composite subcubes of type G 3 ,n-u assume that ra 0 , mi and 
m .2 nodes are chosen in the 0 th , I s * and 2 nd composite subcubes, respectively. By Property 

1.5, 

e 3 (0,rc) = e 3 (l, n) - 0 

e 3 (m,n)< max {e 3 (m 0 , n - 1) + e 3 (mi, n - 1) + e 3 (m 2 , n ~ 1) 

V ) j mj—m 

+m - max{m 0 , mj, m 2 } + min{m 0 , m x , m 2 }}. 

Similar to Theorem 2.2, we can prove by induction on m that e 3 (m,n) < G(m), using 
the above recursive definition of e 3 (m, n), inductive hypothesis, and Corollary 3.1. 

Also similar to Theorem 2.2, a subgraph of m nodes with G'(ra) internal edges can be 
constructed by allocating [y] nodes into each of the first m mod 3 composite subcubes and 
LfJ nodes into each of the remaining composite subcubes; the same method is then used 
recursively to allocate the nodes in each composite subcube. fl 

4 The case k — 4 

Similar to the previous two sections, to determine e 4 (m,n) for G^ n we will have to do some 
preliminary work. The following four lemmas concern additional properties of function W. 
Their proofs can be found in Appendix. 

Lemma 4.1 W(0,4i - 1) = 4TT(0, i- 1) + 4 i for i > 1. 

Lemma 4.2 W( 0, 4 i) = W( 0, i) + 3kP(0, * - 1) + 4 i for i > 1. 

Lemma 4.3 W(0,4i + 1) = 2kP(0, i) + 2T^(0, * - 1) + 4i + 1 for i > 1. 

Lemma 4.4 TT(0,4z + 2) = 3fP(0, i) + W{0, i - l) + 4i + 2fori > 1. 

We next define a recursive function H and show that it is the same function as F defined 
in Section 2. 

Definition 4.1 
J5T(0) = jy(l) = 0; 

H(m) = (m mod 4)H(\f]) + (4 - m mod 4)H([f J) + m ~ + [fj for m > 2. 

Theorem 4.1 H(m ) = W(0,m- 1) for m > 1. 

Proof Similar to the proof of Theorem 2.1. In the inductive step, we consider four cases: 
m = 4 i, m — 4i + 1, m = 4i + 2, and m = 4i + 3, and use Lemmas 4.1, 4.2, 4.3, and 4.4 in 
the four cases, respectively. B 

Corollary 4.1 H{m ) > -ff(mo) + Lf(mi) + iL(m 2 ) + LT(m 3 ) + zn — max{mo, m x , m 2 , m 3 } + 
min{m 0 , m x , m 2 , m 3 } for m 0 + m x + m 2 m 3 = m. 
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Proof If at least three of m 0 , m x , m 2 and m 3 are 0, the inequality holds trivially. If only 
two, say m 2 and m 3 , are 0, assuming that m 0 > m x > 1, the derivation is almost identical 
to the same case in the proof of Corollary 2.1 except here we use H instead of F. If at most 
one of mo, m x , m 2 and m 3 is 0, assuming that mo > m x > m 2 > m 3 , we have 

H(m) = F(m) (Theorems 2.1 and 4.1) 

> -F(mo + m x ) -f F(m 2 + m 3 ) + m 2 + m 3 (Corollary 2.1) 

> F(mo) + F(mi) + F(m 2 ) + F{mf) + m x + m 2 + 2m 3 (Corollary 2.1) 

= ^W + % x ) + jy(m 2 ) + f7(m 3 ) + m x + m 2 + 2m 3 (Theorem 4.1). ■ 

Corollary 4.2 H(m ) = mlog 4 m if m — 4 l for some l. 

Proof Use Definition 4.1 and inductive proof on m. I 
Theorem 4.2 e 4 (m,n) = H(m) for m < 4 n . 

Proof Since <7 4 , n contains four composite subcubes of type <2 4fn _ x , assume that m 0 , m x , m 2 
and m 3 nodes are chosen in the 1 st , 2 nd and 3 rrf composite subcubes, respectively. By 

Property 1.5, 

e 4 (0,n) = e 4 (l,n) = 0 

e 4 (m, n ) < .max {e 4 (mo, w — 1) -f- e 4 (m x , n — 1) + e 4 (m 2 , n — 1) + e 4 (m 3 , n — 1) 

+m — max{mo, m x , m 2 , m 3 } + min{mo, m x , m 2 , m 3 }}. 

Similar to Theorem 2.2, we can prove by induction on m that e 4 (m,n) < if(m), using 
the above recursive definition of e 4 (m,n), inductive hypothesis, and Corollary 4.1. 

Also similar to Theorem 2.2, a subgraph S ^ of m nodes with H(m) internal edges can be 
constructed by allocating nodes into each of the first m mod 4 composite subcubes and 
LfJ nodes into each of the remaining composite subcubes; the same method is then used 
recursively to allocate the nodes in each composite subcube. B 

5 The case k > 5 

Given that essentially the same approach defines the structure of optimal subgraphs for three 
successive values of k, one might suspect a general pattern for all k. It turns out that this 
is not the case and that for k > 5 the decomposition that once defined optimal subgraphs 
now defines suboptimal ones. Consider the example of k = 5, m — 6. If we partition in one 
dimension into one subgraph of two nodes and four subgraphs of one node each we achieve 
six internal edges (a ring of five nodes, with one extra node hanging off the ring). However, 
it is possible to embed the six node graph illustrated in Figure 2 into G 5jTI , and achieve seven 
internal edges. An ability to embed subgraphs of (jr 2 ,n into Gk, n turns out to be what is 
needed to characterize the optimal subgraphs of G k , n with m nodes, when k > 5 and m < 2 n . 
To prove this, we need the following theorem. 
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Theorem 5.1 F{m) > ZLo ^(mO+m-maxo^^-ifm^+mino^xA-iimJ /or S t to = 
m. 


Proof Assume that mo > mi > • • • > rrik-i > 0. Let / be the smallest index such that 
ELo m i > f- Clearly, Y1\Jq mi < y and X 2 t/ m; > y . This also implies that l < k - l. So 

i < §■ 


l k-l l k- 1 

F(m) > F( nrij ) + F( m t ) + min{y^ m, , X] m,} (Corollary 2.1) 
*=o *=H-i ?=o 2=Z-fl 

A: — 1 

> ^X F(mi) + A + B + (7 (Corollary 2.1 repeatedly), 

;=o 


where 


and 




l 

*-i 


A = 

= min{X m i-> 

E 

m,} 



2=0 

Z — . / -f - 1 



l-i 


/ 


B = 

= E 

min{m t -, 

E 

m 2 } 


2=0 


2+1 



2 fc-1 

C = XI min { m o X] m *}- 

2=/+l j=i+l 


Next, we wish to prove that A + H + C > m - m 0 + m/t-i. Since rLm,- > a A = 

jc j ^ — u «■ — 2 * 

2Ji=Z+i m i- Since l < ^ and k > 5, l + l < k — 2. So there is at least one term in C. Therefore, 
C > m^-i . How large is 5? If / = 0, then B - 0 and A+B+C > mi+m k -i = m-m 0 + 
m k~i • If / = 1, then B = mi and A -f/J + 6 r > m i ~b m i + m k-i — m — mo + nik-i • Now 
assume that l > 2. B must have at least two terms. If m h < Y!i~h-\-\ m i for all /i = 0, . . . , / — 2, 
then B - XXo m 2 + m / an d -4 + # + C > X ^-,,+1 m ? + Z)z=o m i + m i + rrik-i > m-mo + m^-i. 
If there is /* in [0, / — 2] such that m^ > X!=/i+i m i (choose the smallest h if there is more than 
one), then B > Efco ™. + EU+i m and 4 + B + C > EEb m< + E.to 1 m, + EU +1 To; + 
> m - m 0 + rrik-i - ■ 


Theorem 5.2 e*;(m, n ) = F(m) /or m < 2 n and ft > 5. 

Proof Since 6/ ?n contains ft composite subcubes of type Gfc,n_i? assume that m z - nodes are 
chosen in the i th composite subcube for 0 < i < k - 1 . By Property 1.5, 

e*(0,n) = e fc (l,n) = 0; 

A:— 1 

e^(m,n)< max {X] e k( m i, n - 1) + m - max {m;} + min {mA}. 

V } y mj=m z -„q 0<i<fc— 1 0<i<A:— 1 

Similar to Theorem 2.2, we can prove by induction on m that e*(m, n) < F(m), using 
the above recursive definition of e/t(m,n), inductive hypothesis, and Theorem 5.1. 
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Figure 3: Construction procedure for C 2 (ra) 

Also similar to Theorem 2.2, a subgraph of m < 2 n nodes with F(m) internal edges 
can be constructed by allocating \~\ nodes into the 0 th composite subcube and [yj nodes 
into the I s * composite subcube; the same method is then used recursively to allocate the 
nodes in each composite subcube. ■ 

What then of subgraphs of size m > 2 n ? For this case we assume that either k is so large 
relative to m that an optimal subgraph cannot include wrap-around edges, or that the graph 
of interest is a mesh (without wrap-around edges) whose local structure is like that of Gk, n - In 
other words, we now also consider multi- dimensional rectangular meshes, structures we will 
call n-D meshes. Intuition tells us that the maximum number of internal edges e^(m : n) can 
be reached when the m nodes are placed as tightly as possible to form a “cubish” polyhedron. 
In the remainder of this section, we shall prove that our intuition turns out to be correct. 

In any dimension i. a subgraph of m nodes can be partitioned into layers , each of which 
contains nodes with the same coordinate in dimension i. Furthermore, there may be edges 
(legs) between adjacent layers. We give the following definition of a cubish polyhedron. 

Definition 5.1 

For any m > 2, there exist l > 2 and 1 < i < n such that T~ l (l - l) n -*+i < m < 
l l (l — l) n \ Let 6 = m — l l *(/ — l) n ~ t+1 . The n-D cubish polyhedron of m nodes in Gk, n , 
denoted as C n (m ), is defined recursively as follows. 

• Ci(m) is a line of m nodes. 

• To construct C n (m), we start with an l x • - x / x (/ - 1) x • • • x (/ - 1) n-D mesh. For 

t — 1 n— 1 + 1 

the remaining S nodes, we construct an (n - 1 )-D layer 0^(6) and add it on the top 
of the n-D mesh in dimension i. 

The above procedure of constructing C n (m ) is very much like making a. ball of yarn. 
The idea is to fill in each side (dimension) with yarn (nodes), one side (dimension) at a 
time. Figure 3 illustrates the construction procedure for C 2 (m), and Figure 4 illustrates the 
procedure for C^im). Let e n (m) be the internal edge count in a cubish polyhedron C n (m). 
Obviously, e n (m ) = [e„-i W + 6) + e n (m - S). 
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Figure 4: Construction procedure for Cs(m) 



Figure 5: Rearrange S m (in Gk, 2 ) without decreasing e(S m ) 

THEOREM 5.3 C n (m) has the maximum internal edge count among all subgraphs S m of m 
nodes in Gk, n (or in n-D meshes ), when the warp-around edges can be discounted. 

Proof We prove by induction on n. When n = 1, the claim is trivially true. Assume that 
the claim holds true for n - 1. Now consider the case of n. Let S m be any subgraph of m 
nodes with e(S m ) internal edges in Gk, n - We wish to prove that e(S m ) < e n (m). 

We can view S m as having several (n — 1)-D layers of nodes stacked on each other in a 
certain dimension. Rearrange the order of the layers by sizes (node counts) and within each 
layer rearrange the nodes into an (n - 1)-D cubish polyhedron. See Figure 5 for an example 
(The numbers in the figure are the sizes of the layers). If after this rearrangement there are 
h layers and S{ is the size of the i th layer with $i < S 2 < • • • < Sh, then by the inductive 
hypothesis we have 

e (-5m) < [e n -i(si) + si] + [e n _i(.s 2 ) + <§ 2 ] *f f [e n _i(s/i_i) + Sh-i] + e n _i(s^). 

Note that $1 + s 2 + h 1 is the number of edges (legs) between adjacent layers. 

We have a few observations about the new subgraph obtained. First, layers in each 
dimension (not just the dimension chosen in the rearrangement) are stacked on each other 
by sizes. Second, h > /. Assume that h < l — 1 for all dimensions. We must have m < 
(l - l) n , which is impossible. Third, si < 1^(1 - l) n ~\ Suppose not. We must have 
m — Si + 1 - Sh> hs\ > Isi > l l {l - l) n ~\ which is impossible. 
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Let us go back to the induction step, in which we assume that e n _i(? 7 i) is maximum and 
wish to prove that e n (m) is maximum. We need another induction on m to prove this. When 
m — 1>2, e n (m) is obviously maximum. Assume that e n (j) is maximum for j < m — 1. Now 
consider the case j = m. We know by the inductive hypothesis that 

e (S m ) < [e n -i(si) + si] + e n (m - Si). 

By Definition 5.1, we know that C n (m — 8) is in fact an n-D mesh with — l) n-i+1 
nodes. C n {m — 8) can also be viewed as having / (or / — 1 if i = 1) layers stacked on each 
other, where each layer is an (n - 1)-D mesh and has L nodes. Clearly, 

f (/-If" 1 if i — l; 

\ - l) n “ i+1 if i > 2. 

We can show that si < L + 6. Suppose not. We must have m > hsi > Isi > IL + 18 > 
IL + 8 > m, which is impossible. To continue, we consider two cases. 

Case 1. si < 8. We must have - l) n_ * +1 < m - s ± < l* (l - l) 71-2 ’. Let m - s l = 
1^(1 - l) n ~ H1 + S'. Then si+8' = 8. So 

— si) = [e n -i (8 ) 8'] + e n (L 1 (/ — l) n 

and 

e n-i(<5i) + e n _i (8') < e n _i(8). 

Therefore, 


c{Sm) 5: [ e 7i-l(-5l ) + Sl] + e n (m — Sj) 

= [c B -i(«i) + St] + [e n -i{8') + S') + e n (l l ~\l - 1)^' +1 ) 

< [e„_ i{8) + 8} + e n (f~ 1 (/ - l) n ~ i+1 ) 

= e n (m). 

Case 2. si > 8. Since Sj < L + 8, we must have (/' - 1 )L < m - si < I'L , where /' = /-! 
if i = 1 and /' = / if i > 2. Let m - si = (/' - 1)X + where d' < L. Then si + 8' = L + 8 
So 

e n (m - si) - [e n _ 1 (d / ) + 8'] + e n ((l' - 1)X). 

Therefore, 


e ($m) 5: [ e n-l('5i) + Si] + e n (m — 5i) 

= [e n _i(6i) + si] + [e n _i (8') + d'] + e n ((V — 1 )L). 
On the other hand, we have 

e n (m) = [e n - 1 (8) + 8] + [e n ^i(L) + L] + e n ((l'~l)L). 
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To show that e(S m ) < e n (m), all we need to prove is that for S\ -f S' — X -f S, 

e n-i(<5i) + e n -i(S') < e n _i(T) + e n ~i(6). 

The inequahty is trivially true when si - X. Let us consider the following subcases. 

Subcase 2.1. si < X. We will prove by yet another induction on dimension n — 1 that 
en-i{si) + e n -\ (S') < e n _i(X) + e n _i(<5), where si,S' < X and si+S' = L + S. When n - 1 = 1, 
it is a trivial case. Assume that the inequality holds for dimension n — 2. Now consider the 
case of n— 1. Without loss of generality, assume that si > S' (The case .si < S' is symmetric). 
Initialize A and B to be C n -\{s\) and C n -i{S'), respectively. The node count in A, denoted 
as |A|, is then and \B\ is S'. Consider A as a cubish polyhedron of several (n — 2)-D layers 
of size L' each plus one more layer of a < L' nodes and a legs on the top, and B as a cubish 
polyhedron of several (n — 2)-D layers of size L" each plus one more layer of b < L" nodes 
and b legs on the top. Since \A\ > |i?|, A completely includes B. So L' > L". We next apply 
the following step to move nodes from B to A. If there is a layer in B with size no greater 
than L - \A\, move the layer together with its legs to A and rearrange two polyhedrons into 
cubish polyhedrons again (Note that after the move A , B , X', X", a, and b are updated). It is 
clear that this step does not decrease the total edge count in the two polyhedrons. Apply the 
above step until for any layer in B its size is larger than X — \A\. We must have X — |A| < L' 
and a + b > L'. Since a, b <L', by the inductive hypothesis, 

e n _ 2 (ct) + e n _ 2 (6) < e n _ 2 (X / ) + e n _ 2 (a + b ~ L'). 

Removing the top layer of a nodes and the top layer of b nodes from A and B , respectively, 
and adding a layer of L' nodes and a layer of a + b - V nodes to A and B , respectively, we 
get |Aj = X and |X?| = S. So 

e n-i(<5i) + e n _i(^ / ) < e n -i (X) + e n _i (S). 

Subcase 2.2. si > X. Assume that si — L + g, then S = S' + g. We have 

e n -i(si) = [en -2 (g) + g] + e n _i(X). 

We can show that S' > (l - 1 )g. Suppose not. We must have m — {V - 1 )X + S' + s 1 < 
(l' — 1)X + (/ — 1 )g + L + g < l(L + g) — ls\ < m, which is impossible. We know that 
S' < X. If all dimensions in C n -i(S') have at least / layers, then S' > l n ~ 2 (l — 1) > X, which 
is impossible. So there must be a dimension in C n -i(S r ) which has fewer than l layers. Since 
S' > (l - l)g, there must be a layer with at least g nodes. So we can move the layer of g 
nodes together with its legs from C n _i(si) to C n -i(S') safely and get 

[ e n-2{g) + g] + ^n-l{S') < e n _i (S' + g). 

Therefore, 

i(si) + e n -i{S') — e n -i(L) + [e n _ 2 (gr) + g] + e n _i(^') 

^ e n— l (X) + e n _i (^ ; + g) 

— e n -i(L) + e n ~i(S). B 
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6 Applications to Partitioning 

The results so far, besides having theoretical interest, have practical applications to par- 
titioning. There are different ways in which A:- ary rc-cubes are appropriate descriptions of 
parallel computations. One way is when at the lowest level the communication pattern of the 
computation is that of a Axary rc-cube, e.g., some mesh-oriented computation with periodic 
boundary conditions. Another is when the communication patterns reflect a &-ary n-cube 
because the computation is about a &-ary n-cube. For instance, the computation may be a 
direct-execution simulation of an application running on an architecture whose communica- 
tion network is G k , n [1, 5]. We partition the simulation in order to balance the simulation 
workload and minimize communication overheads. Another instance is when the computation 
is written as though it executes on all nodes of an k - ary n-cube architecture, but the program 
is to be “folded” onto fewer processors, with subgraphs defined by the folding reflecting a set 
of tasks that are multi-tasked on one node of an actual machine [7]. 

To illustrate these points we show how our results may be used in the context of branch- 
and-bound algorithms for partitioning. Our object here is not to propose the specifics of 
such an algorithm nor study its performance. The ability to construct lower bounds on 
communication costs based only on subgraph node size is one that can be used in a variety of 
branch-and-bound formulations, and for a variety of partitioning problem formulations. We 
will illustrate its use in one specihc case. 

The results can also be used to show the optimality of some curiously shaped partitions, 
an example of this application is shown. 

6.1 Lower Bounding in Branch-and-Bound 

Consider a data parallel computation whose communication structure can be viewed as a k- 
ary n-cube, or related structure. The nodes of the graph are weighted individually to reflect 
computation costs, the edges of the graph are also weighted to reflect communication costs. It 
is assumed that communication between co-resident nodes is free, alternatively, with minor 
modifications one could model such internal communication with smaller — but nonzero- 
costs. We wish to find a rectilinear partitioning [9] of the graph into P subgraphs such that 
the bottleneck cost (the maximum, among all subgraphs, sum of the total node weights and 
the total external edge weights of any subgraph) is minimized. A rectilinear partition is 
one in which the separating cuts are all hyperplanes of the form = Cij , a constant. A 
rectilinear partition of an 8x8 mesh is illustrated in Figure 6. Rectilinear partitions preserve 
the nearest-neighbor communication structure of mesh-like communication patterns, as well 
as having other desirable properties [9]. 

Our earlier work on rectilinear partitioning established that for dimensions larger than 
two, the problem of finding an optimal partition is intractable. Furthermore, that work 
did not explicitly include communication costs. The results in this paper can be used in 
branch-and-bound algorithms [3] for finding rectilinear partitions, as we now show. 

A node in the branch-and-bound search tree reflects a set of cuts already made, the initial 
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Figure 6: Rectilinear partition of an 8 x 8 mesh 


node is empty. The children of a node reflect various ways of choosing one additional cut. 
If there are c cuts to be made, the search tree has depth c + 1. Every solution is a leaf of 
the search tree. We assume that the relative positioning of the cut associated with a level 
is known a priori, e.g., the cut in the third dimension whose cut coordinate is fifth smallest. 
Selecting the cut order is part of the branch-and-bound solution, but our focus here is on the 
lower bounding function needed for the branch-and-bound approach. 

For every node N in the search tree we associate a function bnd(N), that provides a lower 
bound on the bottleneck cost of any solution rooted at that node. bnd(N) can be used to 
direct the search in different ways, e.g., in choosing the next node to explore or in pruning the 
search beyond that node because a known solution is better than any solution rooted at N. 
We are interested in defining an easily computed function bnd(N). Each node N reflects the 
partitioning of the graph into some number of regions; furthermore, under our assumptions 
we know how many further divisions will be applied to each region. Consider a region R , to 
be further divided into s subregions, suppose that the number of nodes in region R is r, that 
the sum of all node weights in R is Wr, and that the edge weights of all edges with at least 
one node in R are sorted in list E in non- decreasing order. 

We wish to construct a lower bound lb(R) on the minimal bottleneck cost due to any 
possible subdivision of R into s subregions. The method we use relies on an ability to compute 
sizes of subregions mi,m 2 ,..., m 5 , m f > 1 for all z, and £? =1 m i = C such that ELi C(m t -) 
is minimized, where C(rrii) is the cost (external edge count) of an optimal subgraph with m* 
nodes. Note that since all nodes in a k - ary n-cube have the same degree d, which is n for 
k = 2 and 2 n for k > 3, we have that C(rrii) = dm; — 2efc(m t -, n ). Solution to this minimization 
problem— even when modified to include a constraint m,- < B for all z, is straightforward 
using dynamic programming. 

The bound construction of lb(R) has three phases. First, we compute the vector m = 
(mi,...,m s ) that minimizes Ef=i C(rrii)’, this reflects an idealized assignment of numbers 
of graph nodes to processors in such a way that the total number of edges cut (summed 
over all processors) is minimized. Second, we compute a vector w whose i th component 
(wi) is the sum of the weights of the first C(m,-) edges in E. w reflects lower bounds on 
communication costs under assignment m. Without loss of generality suppose that w\ is the 
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figure 7: Computation of lower bound on bottleneck cost 

largest component. We define the slack of w as 

5 

slack(w) = ^(uq - Wi). 
i = 2 

T hird, we consider the following two cases. 

The first case of interest is when slack(w)< Wr. This means that if we treat the total 
computational workload Wr as divisible into arbitrary pieces, we can give each processor 
except the first enough workload to bring its total cost up to w u and still have workload 
remaining. The remanent may be divided evenly among the s processors. This is illustrated 
in Figure 7(a). So 

ib(R) = Wr + S?=i 

The correctness of the bound is evident by the fact that the total load (sum of computation 
and communication) is minimized, and that no processor is ever idle. 

The second case occurs when slack(w)> W R , as illustrated by Figure 7(b). In this case the 
bottleneck is entirely communication induced, and the maximum number of nodes assigned 
to a processor must be driven down. This may increase the total communication cost, but 
will decrease the bottleneck cost. To reduce the bottleneck cost we constrain the assignment 
m, < B for all i; for each- 5 considered we may compute the slack of the corresponding 
weight vector, and determine whether it exceeds Wr. Using a binary search on B we may 
find the least value B* such that the corresponding slack exceeds Wr. Let w = (w u w s ) 
and w' = (w[,. ..,w' s ) be the weight vectors derived from using B* - 1 and B* as constraints, 
respectively. Then we make the lower bound to be 


lb(R) = min{- fl + ELlWi ,«,;>. 


We need not consider any bottleneck derived from using B > B*, since the bottleneck cost 
is monotone non-decreasing in max{m,}, which is monotone non-decreasing in B. We need 
not consider any bottleneck derived from using B < B* - 1, since in this case no processor is 
idle, and the total communication cost is at least as large as that derived from using B* - 1. 

Clearly the solution of dynamic programming equations is the most expensive part of 
this bound construction. It may be avoided by using lower bounds on external edge count 
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function C that have concave closed form expression. Such bounds have been developed in 

[ 10 ]: 


B k (rrii,n) 


m,-n - m t log mi for k = 2, mi < 2 n ; 

< 2 min - mi log mi for k > 2, mi < 2 n ; 

k 2m,n - n(mi - m ( n_1 )/ n ) f or & > 2, mi > 2 n . 


Since Bk(mi,n) is concave in mi , the theory of majorization [6] tells us that to mini- 
mize J2i=i Bk{mi^ n) subject to 1 < mi < B and m i = r we assign m* = B for 
* = 1,2, .... [(r - s)/(B - 1)J, with m[(r-s)/{B-i) j+i = (r - s) mod (B - 1) + 1, and ra t - = 1 
for the remainder. 

The procedure above shows how to bound from below the potential least bottleneck cost 
for each region reflected by node N. Applying this method to each such region, we define 
bnd(N) as the greatest of these lower bounds, i.e., 


bnd(N) = TnaiX^{lb( R)} . 


It should be noted that for a given number of processors j P, and a given total workload 
Wr, the assignment problem whose minimized bottleneck cost is least is not necessarily one 
where the workload is spread evenly. For instance, consider an 8 x 8 torus to be partitioned 
into two regions. If each node has weight 4 and each edge has weight 1, then the optimal 
solution is to bisect the graph into two equal pieces, at a cost of 4 x 32 -f 8 = 136. However, the 
graph that weights one node by 128 and all other nodes by 128/63 is optimally partitioned by 
isolating the heavy node, at a cost of 128 + 4 = 132. Realization that minimized bottleneck 
costs need not be associated with evenly spread workload (and equi-partitions) leads us to 
the careful construction of bnd(N) given. 


6.2 Identification of Optimal Partitions 

Another application of our results is to identify optimal partitions (with respect to the bot- 
tleneck metric), even when those partitions are not entirely regular. Consider the problem of 
partitioning G s ,2 (an 8x8 torus) into 13 subgraphs, assuming that all nodes have common 
computation weight w and all edges have unit communication cost. The problem clearly does 
not divide evenly. The minimal cost to a processor of having m nodes is wm + C(m), where 
C(m), the external edge count of the optimal subgraph with m nodes, is 4m — 2e 8 (m, 2); note 
that the cost function increases monotonically in m. 

The processor with the most nodes assigned will have at least [64/13] - 5 nodes. The 
optimal subgraph of Gs , 2 with 5 nodes is a square, with an attached singleton node. As 
illustrated in Figure 8, it is possible to nearly tessellate Gg , 2 with this optimal subgraph, the 
only exception being one subgraph (the center square) which is a subgraph itself of the optimal 
subgraph. The optimality of this partition derives from the fact that wm + C(m) is monotone 
non-decreasing in m, so that the bottleneck cost max{u;mi + C(mi), . . . , u?m 13 + C(mi 3 )} is 
minimized when the m ? -’s are nearly equal. The partition shown achieves the lower bound of 
5 w + C( 5) = 5w + 10. 
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Figure 8: Optimal partition of G s , 2 into 13 subgraphs 

There is clearly a general principle at work here, for uniformly weighted graphs. If there 
are M nodes to be assigned to P processors, then at least one processor will receive m = 
\M / P~\ nodes. When the processor cost function is monotone non-decreasing as a function of 
the number of nodes assigned to it, wm + C(m ) is a lower bound on the optimal bottleneck 
cost, C being the appropriate minimized function for communication cost. If it is possible 
to partition the graph so that no processor has cost greater than wm + C(m), then that 
partition is optimal. 

7 Conclusions 

A subgraph of a A:- ary n-cube can be viewed as having internal edges and external edges. 
This paper describes how to construct subgraphs that are optimal in the sense of maximizing 
the number of internal edges, thus minimizing the number of external edges, given m nodes 
in the subgraph. While these results have combinatorial interest, they also have serious 
applications to problems in parallel processing. We show, for instance, how to apply these 
results in the context of branch-and-bound algorithms for partitioning a ft-ary n-cube whose 
nodes and edges have general (positive) weights. Lower bounds lie at the heart of any branch- 
and-bound algorithm, and our results provide the critical means needed to compute sharper 
bounds than those that ignore communication overheads. We also show how our results can 
be used to demonstrate the optimality of certain irregular partitions, k-ary n-cubes arise 
frequently in studies of parallel processing. The results and applications developed here help 
us to better understand these important graphs. 

Appendix 4 

Property 1.5 In each i th composite subcube (0 < i < k - 1) of type Gk, n - 1 *n Gk, n > choose 
m i nodes, and define m — m i • The number of edges with endpoints among these m 

nodes but in different composite subcubes is no larger than min{?n 0 , mi} for k = 2, and is no 
larger than m - maxo<K*_i{mJ + min 0 < z <fc-i{m,} for k > 3. 

4 To referees: Proofs in this section have all been verified by programs. 
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Proof We observe that if the k composite subcubes of type Gk,n — 1 are placed from left to 
right, any node in one composite subcube is connected to exactly one node in its neighboring 
composite subcubes. When k — 2, it is trivial that the number of edges with endpoints 
among the in nodes but in different composite subcubes is no larger than min{m 0 , mi}. Now 
consider k > 3. Clearly, the number of edges with endpoints among the m nodes but in 
different composite subcubes is no larger than 

min{m 0 , mi} + min{mi, m 2 } + f min{mjt_ 2 , mk~\} -f minimal, mo}. 

Define * + 1 = i + l(mod k) and i - 1 = i - l(mod k). Let m p = maxo<;<jfe_i K} and 
m q = min 0 <,-<*-i {mj. Place k pairs (m 0 , mi), (mi, m 2 ), . . . , (m fc _ 2 , m fc _i), m 0 ) in a 

circle clockwise. Cut the circle into two chains C\ and C 2 such that Ci = {(m„,m ) 

( m 9 -i and ^2 = {(m g ,m 9 _j_ 1 ),...,(m pll ,m p )}. Clearly, 


and 


Consequently, 


E 


)eC x 

1 "T 1 


q 

min{m t -,m. +l } < ^ m t 

i=P + 1 


E 


( mi,m . • )eC 2 
*+1 


V - 1 

min{m,-,m. +1 } < ^ TTli. 

i=q 


k — 1 


Z^min{mi,m. +1 } 

;=o 


= E min{mi,m. +1 }+ ^ m j+1 } 

(mi,m 

q p-1 

< m i 

z=p-f-l 9 

k - 1 

= ^2 m i ~ rrt p + m q 

2=0 

= m- max { md + min {md.B 


Lemma 2.1 W(z, 2i - 1) = W(0, i - 1) + * for i > 1. 


Proof We induct on i. When i = 1, it is trivial that W( 1, 1) = W(0,0) 4- 1. Assume that 
the equation holds for < i — 1. Now consider i. 


W(i,2i-1) = W(i-l,2i-3) + w(2i-2) + w(2i-l)-w(i-l) 

= fC(0, i - 2) + (?' - 1) + tu(2i - 2) + w(2? - 1) ~ - 1) 

(Inductive hypothesis) 
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— 1C(0, i - 2) + (i - 1) + w(2i - 1) (Since w(2i - 2) = u?(i - 1)) 

— ^(0, i — 2) + (i — 1) + w(i — 1) + 1 (Since w(2i — 1) = w(i — 1) + 1) 

= W(0, z — 1) + i. B 

Lemma 2.2 W(i + 1, 2i) = W(0,i - 1) + i for i > 1. 

CVoo/ Straightforward by using Lemma 2.1. ■ 

Lemma 2.3 W(j,/ + z - 1) > W(0, * - 1) + * /or j > i > 1. 

Proo/ We induct on L When i = 1, it is obvious that W(j,j) > W(0,0) + 1. Assume that 

the inequality holds for < i - 1. That is, for j' > i' > 1 and i’ < i - 1, 

mfJ' + i'- 1) > W(0,«'- l) + * # . (1) 

An important implication of the inductive hypothesis is that when j 1 + i' < 2 b for some 6, if 
we replace all parameters of W in (1) by their {2 b — l)-complements, we have 

W(2 h -i’,2 b -l) > W(2 b -f~i , ,2 b -j'-l) + i'. (2) 

Now consider i. 

Case 1. There exists 2 b in (/, j + i - 1] for some 6, and j + i - 1 also has 6+1 bits and 
starts with 1 in its base-2 representation. By (1) and (2), 

W(j,2 b -l) > W(j + i-2\i-l)+(2 i -j). (3) 

Removing the highest bit 1 from 2 6 , . . . ,/ + i - 1, 

W(2 b ,j + i- 1) = W{QJ + i-2 b -l) + (j + i-2 b ). (4) 

Adding (3) and (4), 

W (/, j + i - 1) > W(0,i - 1) + i. 

Case 2. There is no number equal to 2 b in (jj + i - 1] for any 6. We then know that 
h • • ">3 + * — 1 must all have the same number of bits, say 6+1, and the same highest bit 
1. Let pi • • -p t be the longest common prefix of the base-2 representations of /,...,/ + i - 1. 
Let p = pi • 2 b + • • • + p t • 2 b ~ t+1 . Clearly, p < j and p\ > 1. Removing the highest t bits 
Pi- Pt from /,...,/ + i- 1, 

W(j,j + i - 1) > W(j - p,j + i - p - 1) + i. 

Now we wish to show that W(j — p,j + i — p — 1) > W(0, i — 1), or equivalently W ( j — + 

i — p — l) — W (0, * - 1) >0. If j — p < «, 

W / (/-p,/+?-p-l)-W(0,«-l) = W(i,/ + ?-p-l)- W(0,/-p- 1) 

> 3 ~P (By (1)) 

> 0. 
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If 3 ~ P > C there must exist 2 b in (j — p,j + i — p — 1] for some b' < b since the highest bit 
in j - p is not the same as the highest bit in j -f- i - p - 1. By Case 1, 

W(j-p,j + i-p - 1) - W(0,i- 1) > i > 0. ■ 

Lemma 3.1 Z(0,3i - 1) = 3Z(0, i - 1) + 3 i for i > 1. 

Proof We induct on i. When i = 1, Z( 0,2) = 3Z(0,0) + 3 = 3. Assume that the equation 
holds for < i — 1. Now consider i. 

Z(0, 3i — 1) = Z(0,3t-4) + 2:(3*-3) + «(3i-2) + «(3t-l) 

= 3Z(0, i- 2) + 3(i - 1) + z(3i - 3) + z(3i - 2) + z(3i - 1) 

(Inductive hypothesis) 

= 3Z(0, i - 2) + 3(? - 1) + z(i - 1) + z(3i - 2) + z(3i - 1) 

(Since z(3i - 3) = z(i — 1)) 

= 3Z(0, i — 2) + 3(z — 1) + z(i — 1) + z(i — 1) + 1 + z(3i — 1) 

(Since z(3i — 2) = z(i — 1) + 1) 

= 3Z(0, i - 2) + 3(i - 1) + z(i - 1) + z(i - 1) + 1 + z(i - 1) + 2 
(Since z(32 — 1) = z{i — 1) + 2) 

= 3Z(0, z — 1) + 3i. ■ 

Lemma 3.2 Z(0, 3t) = Z( 0, i) + 2Z(0, i- 1) + 3t /or i > 1. 

Proo/ Straightforward by using Lemma 3.1. ■ 

Lemma 3.3 Z(0,3*+ 1) = 2Z(0, i) + Z(0,i - l) + 3i+ 1 fori > 1. 

Proof Straightforward by using Lemma 3.2. ■ 

Lemma 3.4 Z(j,j + i - 1) > Z(0, i - 1) + i for j >i> 1. 

Proof Use the proof of Lemma 2.3, but change W to Z and base-2 to base-3. To be more 
specific, in the inductive step, consider the following two cases. 

Case 1. There exists 3 6 or 2 • 3 6 in ( j, j + i - 1] for some 6, and j Pi - 1 also has b + 1 
bits and starts with 1 or 2, respectively, in its base-3 representation. 

Case 2. There is no number equal to 3 6 or 2 • 3 b in (j, j + i — 1] for any b. ■ 

Before we go to prove Lemma 3.5, we need following claims. 

Claim 1 Z(jJ + i - 1) > Z(0,z- 1) for j > 0 and i > 1. 

Proof Trivial by Lemma 3.4. fl 

Claim 2 Z(l - i, l - 1) > Z{1 - j - i, / - j - 1) + i for j > i > l, j + i < l and l - 3 b or 2-3 b 
for some b. 
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Proof Replace all parameters of Z in Lemma 3.4 by their (/ - 1)- complements. B 

Claim 3 Z(l -i,l-l)> Z(l - j - i,l - j - 1) for j > 0, i > 1, j + i < l and l - 3 b or 2 • 3 6 
for some b. 

Proof Replace all parameters of Z in Claim 1 by their (/ - l)-complements. B 

Lemma 3.5 Z(j, j 4- ii + i 2 — 1) > Z(0, i\ — 1) + Z(0, i 2 — 1) + ?'i + 2?*2 for j > z'j > i 2 > 1. 

Proof We induct on ii + i 2 . When z'i -f ^2 = 2, we must have ii = i 2 = 1. It is obvious that 
j + 1) > Z(0,0) + Z(0,0) + 3. Assume that the inequality holds for < ij 4- i 2 - 1. That 
is, for j' > i[ >i' 2 > 1 and i[ + i' 2 < i t 4- i 2 - 1, 

+ ij 4- i' 2 - 1) > Z(0, ii - 1) + Z(0, i' 2 -1)4- ii 4- 24. (5) 

An important implication of the inductive hypothesis is that when f 4- i[ + i 2 < / and / = 3 b 

or 2 • 3 b for some 6, if we replace all parameters of Z in (5) by their (/ - 1)- complements, we 
have 


+ > Z(l-f-i[-i' 2 ,l-j'-l) + i' 1 + 2 i' i . (6) 

Now consider z'i + i 2 . 

Case 1. There exists 2 • 3‘ in (j,j + », + i 2 - 1] for some b, and j + », + i 2 - 1 also has 
6+1 bits and starts with 2 in its base-3 representation. 

Subcase 1.1. There is 3 b in (j, 2 • 3 6 ). Removing the highest bit 1 from 3 6 , . . . , 3 6 + A - 1, 

Z(3 6 , 3 6 + h - 1) = Z(0, ij - 1) + ii. (7) 

Removing the highest bit 2 from 2 • 3\ . . . ,j + z x + i 2 - 1, 

^(2 *3 , i + «i + ?2 — 1) — ^(0,4 + i\ + « 2 — 2 • 3 6 — 1) + 2(j + u + ?2 — 2 • 3 6 ). (8) 

Removing the highest bit 1 from 3 6 + z x , . . . , 2 • 3 6 - 1, 

Z(3 6 + i u 2 * 3 6 - 1) - Z(i u 3 b - 1) + (3 6 - ti). (9) 

Next, 

Z(j, 3 b — 1) + Z(3 & + z x , 2 • 3 6 — 1) 

= ^(j,3 6 - 1) + Z(z x ,3 6 - 1) + (3 6 - *i) (By (9)) 

> Z(i + »i + *2 ~ 2 ■ 3 6 , z 2 ~ 1) + (3 6 - A) + 2(3 b - i) + (3 6 - z x ) (By (5) (6)). (10) 
Adding (7), (8) and (10), 

Z(j,j + h + h - 1) > Z(0,ti - 1) + Z(0, i 2 - 1) + i\ + 2 i 2 . 
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Subcase 1.2. There is no number equal to 3 6 in (j, 2 -3 & ). We then know that j must have 
6 + 1 bits and be at least 3 b . 

Subsubcase 1.2.1. Assume that j + ii + i 2 — 2 • 3 6 > i 2 . Removing the highest bit 2 from 
2.3\...,2-3 6 + * 2 — 1, 

Z(2 • 3 6 , 2 • 3 6 + i 2 - 1) = Z(0,i 2 - l) + 2* 2 . (11) 

Removing the highest bit 2 from 2 • 3 6 + * 2 , . . . J + i x + i 2 - 1, 

Z { 2 * 3 6 + « 2 , j + «'i + it - 1) = Z(i 2 ,j + i\ + i 2 - 2 • 3 b - 1) + 2(j + * x - 2 • 3 6 ) 

> 2(0, j + *1 - 2 • 3 b - 1) + 2 (j + ti - 2 • 3 fe ) (Claim 1) 

- Z(2-3\j + *! - 1). (12) 

Therefore, 

Z(j,j + «i + ^2 — 1) 

= Z(j, 2 • .S 6 - 1) + Z(2 • 3 6 ,2 -3 6 + » 2 - 1)+ 2(2 -3 6 + i 2 ,j + h + i 2 - 1) 

— z(j,2-3 b - 1) + 2(0, *2 - 1) + Z(2 -3 6 ,i + »! - l) + 2 i 2 (By (11) (12)) 

= Z(h3 + *i — 1) + Z(0, i 2 — 1) + 2i 2 
> Z( 0, ii — 1) + Z( 0, i 2 — 1) + + 2t 2 (Lemma 3.4) 

Subsubcase 1.2.2. Assume that j + A + i 2 — 2 • 3 b < i 2 . Removing the highest bit 2 from 

2 * 3 6 , . . . , j + ?h + *2 — 1, 

Z(2 • 3 6 , j + *! + * 2 - 1) = Z(0, j + *i + * 2 - 2 • 3 6 - 1) + 2 (j + ij + * 2 - 2 • 3 6 ). (13) 

By Lemma 3.4, 

2(j,i + *i — 1) > 2(0, - 1) +.*i. (14) 

Removing the highest bit 1 from j + t 1? . . . , 2 • 3 6 - 1, 

Z(j + *i,2-3 6 - 1) = ^(i + ii“3 6 ,3 6 -l) + (2-3 6 -j-i 1 ) 

> Z(j + *i + *2 — 2 • 3 6 , «2 - 1) + 2(2 ■ 3 b — j — *i) (Claim 2). (15) 
Adding (13), (14) and (15), 

Z(J->j + H + h ~ 1) > Z{ 0, i\ — 1) + Z{ 0, i 2 — 1) + i\ + 2i 2 . 

Case 2. There exists 3 b in (j,j + + ?2 — 1] for some 6, and j + + i 2 — 1 also has 6 + 1 

bits and starts with 1 in its base-3 representation. 

Subcase 2.1. There is 2 • 3 6-1 in (j, 3 b ). 

Subsubcase 2.1.1. Assume that i 2 < 3 6-1 . Removing the highest bit 2 from 2-3 b ~ 1 , . . ., 2- 
3 6 1 + i 2 — 1, 

Z{2 • 3 6-1 ,2 • 3 6-1 + i 2 - 1) = 2(0, « 2 - 1) + 2* 2 . (16) 
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( 17 ) 


Removing the highest bit 1 from 3 6 , . . . , j + i x + i 2 - 1, 

Z (3 ,j + i\ + ?2 — 1) = Z(0,j + h + i 2 — 3 6 — 1) + (j + ii + i 2 — 3 6 ). 

By Claim 2, 

Z&i.tf-'-l) > Z(j + *i — 2 • 3 t_1 , ij — 1) + (2 • 3 6-1 — j) (18) 

and 

Z(2 • 3 6-1 + i 2 , l — 1) > Z(j + ti + « 2 - 3 6 ,j + - 2 • 3 6-1 - 1) -f (3 6-1 - i 2 ). (19) 

Adding (16), (17), (18) and (19), 

z (jJ + *i + i-2 ~ 1) > Z( 0, ii - 1) + Z( 0, i 2 - 1) + h + 2t 2 . 

Subsubcase 2.1.2. Assume that i 2 > 3 6-1 . Then j > i x > i 2 > 3 6-1 . 
z (j — 3 b 1 , i + A + i 2 — 3 b — 1)) 

> Z( 0, *1 - 3 6 " 1 - 1) + Z( 0, »2 - 3 6 " 1 - 1) + (*! - 3 6 ” 1 ) + 2(i 2 - (5)) (20) 

Let h = min{3 6 + 2 • 3 b ~ l ,j + i\ + i 2 }. 

Z(h- 2-3 6 ~ 1 ,h- 1) 

> Z(0, 3 6-1 — 1) + Z(0, 3 6-1 — 1) + 3 6-1 + 2(3 i_1 ). (By (5)) (21) 

Subtracting 3 6 ” 1 from j , . . . , h - 2 • 3 6-1 - 1, 

Z(j, h - 2 • 3 6 " 1 - 1) - Z{j - 3 6 " 1 , h - 3 6 - 1) + (h ~ j - 2 . 3 6 " 1 ). (22) 

In the case of h — 3 6 + 2-3 6_1 , removing the highest bit 1 from 3 6 + 2-3 6-1 , f h + i 2 - 1, 
Z(h,j + ii + i 2 — 1) 

“ z ( 2 ‘ 3 6 ~\ j + h + %2 - 3 6 - 1) + (j + i y + i 2 - 3 6 - 2 • 3 fc_1 ). (23) 

Adding (22) and (23), 

Z(j, h - 2 • 3 6 ' 1 - 1) + Z(h,j + i 1 + i 2 - 1) 

= Z(j — 3 b 1 ,j + i\ + i 2 - 3 b - 1) + (*! + i 2 - 2 • 3 6-1 ) 

> Z( 0, iy - 3 6-1 - 1) + Z( 0, i 2 - 3 6-1 - 1) + (A - 3 &_1 ) -f 2 (t 2 - 3 6 " 1 ) 

+(ii + * 2 — 2 • 3 6-1 ) (By (20)) 

= ^(3 6-1 , - 1) + Z(3 6-1 , * 2 - 1) + (ii - 3 5 - 1 ) + 2(i 2 - 3 6 ' 1 ). (24) 

Adding (21) and (24), 

Z(jJ + i\ + h — 1) > Z(0,i! — 1) + Z(0,i 2 — 1) -f i x + 2 i 2 . 
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Subcase 2.2. There is no number equal to 2 • 3 6-1 in (j,3 b ). We then know that j must 
be at least 2 • 3 6-1 . 

Subsubcase 2.2.1. Assume that j + ii + i 2 — 3 b > ii. Removing the highest bit 1 from 
3 6 , . . . , 3 b + ii — 1, 

Z(3 b ,3 b + i x - 1) - Z( 0,11-1) + *!. (25) 

Removing the highest bit 1 from 3 b + *'j, . . .,j + jj + i 2 - 1, 

Z(3 b + j + i\ + i 2 - 1) = Z(ii,j + ii + i 2 - 3 b - 1) + (j + i 2 - 3 b ) 

^ Z(0 ,j + %2 — 3 b — 1) + 2 (j + i 2 — 3 b ) (Lemma 3.4). (26) 

Removing the highest bit 2 from j, . . . , 3 6 - 1, 

Z(j, 3 6 -l) - Z(j — 2 • 3 6-1 , 3 6-1 — 1) + 2(3 6 — j) 

> Z{j + «2 ~ 3 6 , i 2 — 1) + 2(3 b — j ) (Claim 3). (27) 

Adding (25), (26) and (27), 

Z(j,j + h + i 2 - 1) > Z( 0,«i - 1) + Z(0,i 2 ~ 1) + *i + 2z 2 . 

Subsubcase 2.2.2. Assume that j + ii + i 2 — 3 b < i\. Removing the highest bit 1 from 
3^, . . . ,j + i\ + i 2 — 1, 

Z{3 b J + h + i 2 ~ 1) = Z(0,j + «i + i 2 - 3 6 - 1) + (j + z‘i + j 2 - 3 6 ). (28) 

Removing the highest bit 2 from j, j + i 2 - 1, 


^(i,i + *2-l) = Z(j - 2 • 3 6 1 , j + 2 2 — 2 • 3 6-1 — 1) + 2?s 
> Z( 0, z 2 — 1) + 2i 2 (Claim 1). 


By Claim 2, 


Z{j + « 2 j 3 6 — 1) > Z(j + ii + z 2 - 3 6 , i\ — 1) + (3 6 — j — i 2 ). (30) 

Adding (28), (29) and (30), 


z (j + j + *! + h ~ 1) > £(0, ii - 1) + Z( 0, z 2 - 1) + ii + 2z 2 . 

Case 3. There is no number equal to 3 6 or 2 • 3 6 in ( jj + + i 2 - 1] for any 6. We then 

know that j, . . . J + i\ + i 2 - 1 must all have the same number of bits, say 6+1, and the same 
highest bit 1 or 2. Let p x • • -p t be the longest common prefix of the base-3 representations of 

J, • • • 5 j + *i + *2 — 1- Let p = p x • 3 6 H h p< ■ 3 6 ~ i+1 . Clearly, p < j and pi > 1. Removing 

the highest t bits pi • • -p t from j , . . . , j + + i 2 - 1, 

z (jJ + *i + *2 - 1) > Z(j - P, J + ?i + z 2 - p - 1) + (z‘i + i 2 ). 
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Now we wish to show that Z(j -p,j + z x + i 2 - p - 1) > Z(0, h - 1) + Z( 0, i 2 - 1) + i 2 , or 

equivalently, Z(j - p,j + h + i 2 - p - 1) - Z(0, i x - 1) - Z(0, t 2 - 1) > i 2 . If j - p< i u 

Z U ~ Pj + i l +h-p- 1)- Z(0,«i - 1) - Z(0,« 2 - 1) 

~ z (iuj + h + i 2 — p — 1) — Z(0,j — p — 1) — Z(0, z 2 — 1) 

> max{j — p, i 2 } + 2 minjji — p, i 2 } (By (5)). 

> i 2 . 

If 3 ~ P > *i, there must exist 3 b ' or 2 • 3 b> in (j -pj+h+h-p- 1] for some b' < b since 
the highest bit in j - pis not the same as the highest bit in j + + i 2 - p - 1. By Cases 1 

and 2, 

Z{j ~P,j + h + « 2 - P ~ 1) - Z( 0, h - 1) - Z(0, t* 2 - 1) > «i + 2z 2 > * 2 . ■ 

Lemma 4.1 W(0, 4 i - 1) = 4W( 0, i - 1) + 4* for i > 1. 

Proof We induct on i. When i = 1, W(0,3) = 4W(0,0) + 4 = 4. Assume that the equation 
holds for <2 — 1. Now consider i. 

W(0, 4i — 1) = W (0, 4? — 5) + w(4i — 4) + w(4i — 3) + w(4i — 2) + w(4i — 1) 

= 4W (°> i ~ 2 ) + ( 4i - 4) + w(4i - 4) + w(4i - 3) + w(4i - 2) + w(4i - 1) 

(Inductive hypothesis) 

~ 4W(0, i — 2) + (4? — 4) -f w(z — 1) + w(4z — 3) + w(4i — 2) + w(4i — 1) 
(Since w(4i - 4) = w(i - 1)) 

= 4 W (0, i - 2) + (4z - 4) + w(i - 1) + w(* - 1) + 1 + w(4z - 2) 

+w(4« — 1) (Since w(4z - 3) = w(i ~ 1) + 1) 

= 411/(0,2- 2) + (42 - 4) + w(i - 1) + w(i - 1) + 1 + w(i - 1) + 1 
+w(4i - 1) (Since w(4i - 2) = w(i - 1) + 1) 

= 4W(0, i - 2) + (42 - 4) 4- w(i - 1) + w(i - 1) + 1 + w(i - 1) -f 1 

+w(i - 1) + 2 (Since w(4i - 1) = w(i - 1) + 2) 

= 4W(0 , 2 — 1) -b 4z. D 

Lemma 4.2 W(0,4z) = W(0, i) + 3W(0, i- 1) + 4z for i > 1. 

Proof Straightforward by using Lemma 4.1. ■ 

Lemma 4.3 W{0 : 4i + 1) = 2W(0,i) + 2W(0,i- 1) + 4i + 1 fori> L 
Proof Straightforward by using Lemma 4.2. ■ 

Lemma 4.4 W(0,4i + 2) = 3W(0,i) + W(0, i - 1) + 4i + 2 for i > 1. 

Proof Straightforward by using Lemma 4.3. ■ 
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